Randomly high CPU, typing lag, tons of TCP send / receive, from/to localhost for hours in remote desktop environment
Categories
(Thunderbird :: Untriaged, defect)
Tracking
(Not tracked)
People
(Reporter: duparchy, Unassigned)
References
()
Details
(Keywords: perf)
Attachments
(5 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0
Steps to reproduce:
Nothing special
Actual results:
Thunderbird has been eating the CPU for hours now.
Monitoring TB process with procmon (windows), Thunderbird seems to lost in a loop of TCP send / receive from/to localhost.
What's the point of those loopback transmit ?
In fact the very problem, maybe not related to those loopback connection, is that users are experience typing lags (again...)
Comment 2•4 years ago
|
||
Are these same users on RDS?
Hi, yes. That's my first attempt to understand the bug/feature before opening that case.
I manage 6 Windows 2019 RDSH servers and ~70 users.
This bug occurs randomly on different users, servers.
I will have to rervert back to TB 60 (again..).
Hi,
As far as I can tell from the user point of view, this is not related to 166881.
Here we have I think a bug, triggering a 20 years old questionable design (loopback TCP connexion).
This is resulting in a kind of DoS attack.
Beside the bug, the inter-process loopback connextion does not seems to be a true loopback to localhost.
Or is it just Process Monitor that translate "localhost" to the FQDN ?
Because true loopback connexions are supposed to be optimized for inter-process communications. See https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/hh997026(v=ws.11)
I've seen that TB 91 is now multi-process. Is there a chance that this inter-process communication through loopback (or pseudo-loopback) interface has been improved / re-designed ?
Uploaded another capture of that tcp loopback flooding. Mind the 62% of the overall server I/O events. Looks like A true denial of service.
(Note that I checked with netstat -ano. This is a true loopback (127.0.0.1). procmon.exe translates it to the fqdn.)
Upgraded to TB 91.1.
This TCP loopback connection bug/feature persists.
On occasion a user's TB process will seemingly be going avoke.
Reporter | ||
Comment 10•4 years ago
|
||
This TCP loopback connection bug/feature didn't show up for days now. Maybe it's gone.
Monitoring TB activity with procmon I still see tons of registry query. All the same in a row. There's perhaps room for improvement here.
What's the point of those dozens of RegQueryValue :
HKLM\SOFTWARE\Microsoft\Input\InputServiceEnabledForCCI
HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\OOBE\LaunchUserOOBE
Reporter | ||
Comment 11•4 years ago
|
||
One occurrence today of that seemingly infinite TCP Loopback connexions.
So it's still there...
Reporter | ||
Comment 12•4 years ago
|
||
Again...
I checked the user's settings.
- Two IMAP accounts. Both accounts are set to not synchronize locally
- No add-ons
- No Global indexer.
Reporter | ||
Comment 13•4 years ago
|
||
Is there something I can do to help resolve this bug ?
Logging ?
Reporter | ||
Comment 14•4 years ago
|
||
For one user where this problem occurs frequently I've created a profile from scratch.
Problem NOT fixed.
This seems to be worst..... Instead of taking 6% (of 14vCPU... i.e ~85% of one CPU), I see TB process lost in loopback connexions reaching 11%
Reporter | ||
Comment 15•4 years ago
|
||
- a second process taking 5% (of 14vCPU).
This creates problems when "real time" networking is required. Other users on Zoom , Teams etc.. are experiencing problem with audio/video.
Comment hidden (obsolete) |
Reporter | ||
Comment 17•4 years ago
|
||
up.
Is there something I can do to help resolve this bug ?
Logging ?
Reporter | ||
Comment 18•4 years ago
|
||
99.99% chance that this process, which takes 6% of 14CPu permanently is lost in "localhost" loop
Reporter | ||
Comment 19•4 years ago
|
||
This are TB process for 9 differents users up there.
Comment 20•4 years ago
|
||
Please try version 91 with Help > Troubleshoot mode
Reporter | ||
Comment 21•4 years ago
|
||
I already tested a newly created profile for a user (w/o extension).
So unless there's something else that I can diagnose in troubleshooting mode, I don't think it's worth the trouble to disturb a user.
As I said, this problem, as harmless as it looks, is in fact a kind of Denial Of Service and slows down the entire server.
This is not just me, this is randomly killing one server after another in an an entire RDSH farm infrastructure w/ 8 servers and 85 users.
Anyway, Thunderbird makes also way too much disk access for a cloud infrastructure using shared iSCSI or FC storage array.
Unless steps are done to improve that situation, we won't use it for long. This is sad.
Reporter | ||
Comment 22•4 years ago
|
||
TB 91.3. No improvement.
Two Thunderbird process for two different users accounting for 54% of all events on that server.
Reporter | ||
Comment 23•4 years ago
|
||
No improvement w/ 91.3
Comment 24•3 years ago
|
||
(In reply to duparchy from comment #7)
I've seen that TB 91 is now multi-process. Is there a chance that this inter-process communication through loopback (or pseudo-loopback) interface has been improved / re-designed ?
Not AFAIK, which your testing confirms. No idea where this traffic is coming from. Maybe Magnus has an idea.
Anyway, Thunderbird makes also way too much disk access for a cloud infrastructure using shared iSCSI or FC storage array.
Yes, this has been true for many years. Debilitating in some cases. Is there any possibility to put Thunderbird data on the server's local disk, which should help? (for example disk local on the hypervisor)
I mention this because there will be no relief coming from Thunderbird until the buffering issues are fixed by benc's refactoring and Bug 1121842 - [META] RFC: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends.
Reporter | ||
Comment 25•3 years ago
|
||
Hi,
This idea behind a "cloud" infrastructure is that everything is backed-up at the storage array level.
Plus, there will be some level of high-availability (Live volume, etc...) .
Not only moving some data on local disks would be cumbersome (to edit everyone's Thunderbird to move her/his profile) but this is defeating the entire "Cloud" idea.
In addition that would be impossible when me move our private cloud to a Cloud provider (AWS, Azure etc..)
For good or bad, we are living the "Cloud" days, at least for professionals.
Developers should stop thinking that everyone sits beside his or her own brick-and-mortar height-core w/ 32G of RAM.
Thanks for trying to push the idea to whom it may concern.
And Thanks for you support.
Comment 26•3 years ago
|
||
No idea what would cause it, but probably dupe of bug 1732926. Try bug 1732926 comment 15 and report back there.
Reporter | ||
Comment 27•3 years ago
|
||
Yes I could try to disable the multi-process. But given the fact that feature/bug was present before TB 91 multi-process, I doubt it will have any effect.
Comment 28•3 years ago
|
||
Just to clarify ... this issue doesn't exist for you in version 68?
The port numbers are 55238, 55239, 52682, 52689?
(In reply to duparchy from comment #27)
that feature/bug was present before TB 91 multi-process
True, but it will at least remove one variable from the diagnosis process. Lest we forget about it, I suggest that it stay disabled until all your problems are resolved.
Reporter | ||
Comment 29•3 years ago
|
||
Maybe it was simply unnoticed in TB 60 , but users didn't complain about performance and lags
Right now three persons in a raw on the same server with the "send-receive gone crazy "problem.
Reporter | ||
Comment 30•3 years ago
|
||
Reporter | ||
Comment 31•3 years ago
|
||
Still there. (Not checked w/ TB 100+ though)
Most of the time it goes unnoticed because we're on 10Gb network / 16 CPUs.
Up until high cpu/network loads (several Zoom in a raw. We're talking about a RDSH server w/ many users) reveals that underlying problem.
Comment 32•3 years ago
|
||
Reporter, does this still fail for you when using version 102 or newer version?
Reporter | ||
Comment 33•3 years ago
|
||
Hi,
I rolled it out to our RDSH servers last week so I can't tell for sure if the problem is gone.
TB 102 seems much more performant.
Definitely improved on I/Os.
Though, I still see some dubious I/Os through locahost TCP. But I've not seen any TB process going crazy so far.
Are there any information about underlying changes that would make us confident about the resolution of that problem ?
Comment 34•2 years ago
|
||
Resolved per whiteboard
Reporter | ||
Comment 35•2 years ago
|
||
Hi,
Here we go again.
Upgraded to 115.3.1
To help a user I explained him how to do a "repair folder".. and did it on my on Inbox (10K message).
It's now been 18h now that TB is eating my CPU on TCP Sends/Receives.
Description
•